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Constructing test instances for Basis Pursuit Denoising 

Dirk A. Lorenz 



Abstract — The number of available algorithms for the so-called Basis 
Pursuit Denoising problem (or the related LASSO-problem) is large and 
keeps growing. Similarly, the number of experiments to evaluate and 
compare these algorithms on different instances is growing. 

In this note, we present a method to produce instances with exact 
solutions which is based on a simple observation which is related to the 
so called source condition from sparse regularization. 

EDICS: DSP-RECO, DSP-ALGO 

I. Introduction 

"Lately, there has been a lot of fuss about sparse approximation." 
is the beginning of the paper |30| from 2006 and this note could 
have started with the same sentence. Three different minimization 
problems have gained much attention. We follow |31| and denote 
them as follows: For a matrix A € R fcxn and b G R fc and positive 
numbers a, A and r we define the Basis Pursuit Denoising ( (7)) 
with constraint by 



min 1 1 a; 1 1 1 subject to \Ax — 6|| 2 < a, 

X 

the Basis Pursuit Denoising with penalty ( (7j) by 

min | \\Ax — b\\\ + A \\x\\^ , 



(QPa) 



and the LASSO (least absolute shrinkage and selection operator |29|) 
by 



min \\Ax — b\\ 2 subject to 



< T. 



(LS T ) 



All three problems are related: if we denote with xqp(A) a solution 
of \QP X \ , this also solves jBPjfor a = ||Aeqp(A) - 6|| 2 and jLSTJ 
for t = || a?Qp (A) || x (see e.g. |25|, |31|). However, this relation is 
implicit and relies in general on the knowledge of the solutions. 
Hence, it is not totally true that these problems are equivalent. 

One may argue, that < |BP CT ^ is harder than the other problems since 
its objective is nonsmooth and shall be minimized over a complicated 
convex set (e.g. projecting on this set is difficult). Moreover, one may 
argue, that ( |QP A ^ is harder than (EST} since the latter has a smooth 
objective (to be minimized over a somehow simple convex set) while 
the first has a nonsmooth objective. Computational experience with 
with these problems lead to the same conclusion. 

Recently, minimization problems similar to Basis Pursuit De- 
noising have appeared in several contexts, e.g. group sparsity (or 
joint sparsity) |13| , |26|, |32| for sparse recovery, nuclear norm 
minimization for low-rank matrix recovery [28| to name just two. 

A. Notation 

With 1 1 a; || p we denote the p-norm of a vector x S R", A T is the 
transpose of a matrix A, the range of a matrix A is denoted with 
TgA and with Sign(a;) we denote the multivalued sign, i.e. 



y e Sign(x) 




if Xi > 
if Xi < 
if Xi = 
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II. Construction of instances with known solution 

In this section we illustrate how instances (i.e. tuples (A, b, A)) 
can be generated, such that the solution x* of l |QP A [ ( is known up to 
machine precision. This is achieved by prescribing the solution x* 
(and the matrix A and the value A) and computing a corresponding 
right hand side b. 

The basis is the following simple observation which has a one-line 
proof: 

Lemma 1: Let A € R fcxn , A > and x* £ R" and let w e 
rgA T fulfill w £ Sign(a;*). Then it holds: If y is a solution to 
A T y — w and b is defined by b = Ay + Ax", then x* is a solution 

of W>S- 

Proof: Simply check 

-A T (Ax* -b) = —A T (Ax* - Ay - Ax") 

= XA T y = Xw £ ASign(:r*). 



(BP CT ) Hence x* fulfills the necessary and sufficient condition for optimality. 



Remark 2: The existence of the vector w is exactly the source 
condition used in sparse regularization of ill-posed problems. There 
one shows that a vector x* for which such a vector w exists can be 
reconstructed from noisy measurements b s with \\Ax^ — b s \\ 2 < S 
by solving l |QP A [ l with b s instead of 6 and A x S and that one 
achieves a linear convergence rate, i.e. for the solution a;* one gets 
\\x{ - x t \\ i = 0{8), see (TSJ, (TT), (23). 

The following corollary reformulates the above lemma in a way 
which is more suitable for an algorithmic reformulation. 

Corollary 3: Let {1, . . . , n} be partitioned into sets X, A+ and 
A- and let x* £ R" be any vector such that 



x* > 0, i £ A+ 

xl < 0, ie A- 

x\ = 0, i€l 
and let A > 0. Furthermore assume that y 6 R fc fulfills 

1, ie A+ 
-l, ieA- 
i e 1 



{A T y)r 
(A T y)i 



(2) 



\A T yU < 1, 



and define b — Ay + Ax* . Then x* is a solution of < |QP A [ ). 
According to this corollary we can construct an instance (A, b, A) 
with known solution x* as follows: 

1) Specify A £ R mxn an( j a sign-pattern (given by the partition 
A+, A-, I). 

2) Construct a vector y £ R m which fulfills (|2j. 

3) Choose any A > and any x* G R" which complies with the 
sign-pattern, i.e. (|TJ holds. 

4) Define b = Ay + Ax. 

The vector y can be constructed by several methods which are outline 
in Appendix [A| These methods have been implemented in the Matlab 
package LITestPack in the function construct_bpdn_rhs Q 
One should note that a vector y as in Corollary [3] need not to exist. 



The package is available at http://www.tu-braunschweig.de/iaa/personal/ 
lorenz/1 1 testpack 
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Indeed, for a fixed matrix A not every sign-pattern of x* can occur 
as a minimizer of any l |QP A [ l. 

Remark 4: For injective A everything is much simpler: Since A T 
is surjective, we can just choose some w £ Sign(:r*), solve A T y = w 
and set b — Xy + Ax* . 

We discuss advantages and disadvantages of our approach: 
Advantages: 

• The algorithm is independent of the value of A while the 
performance of solvers for < |QP A ^ usually deteriorates for smaller 
A, see, e.g. [TT), (TJ) and Section jUTA] 

• The algorithm is independent of the dynamic range of the 
optimal value x*, however, several experiments have recorded 
that the performance of solvers for l |QP A } depends greatly on 
the dynamic range, see, e.g. (4) and Section |III-C| 

• For square matrices A with full rank, one immediately get 
a desired vector y by solving A T y = w for some vector 
w G Sign(:r*). While this setting is unusual, e.g., in compressed 
sensing, one encounters such situations in regularization with 
sparsity constraints, see (6), (5), (TO), (18), (22), (27) . 

Disadvantages 

• The construction of b from x* leads to a specific noise model, 
namely, the noise is given by Ay. Hence, there is no control 
about the noise distribution^ This limits the use of instances 
constructed in this way to the comparison of solvers for basis 
pursuit denoising. For other sparse reconstruction methods like 
matching pursuit algorithms they seem to be useless. 

• The algorithm produces one particular element w £ Sign(:r*) 
and it is not clear if this has any additional properties. Usually, 
several w £ Sign(a:*) Plrgyl T exist and probably the proposed 
method favors a particular form of w. 

III. Illustrative instances 

Numerous papers contain comparisons of different solvers for the 
three problems ([BP7J, {QP7} and ( |LSTT >, see e.g. (3), (4), (TT), (IS) , 
1 20 1, 1 25 1, [31| , (34). Hence, we not aim at yet another comparison of 
solvers but try to illustrate, how different features of the measurement 
matrix and the solution influence the difficulty of the problem. 

From the zoo of available solvers we have chosen four. The 
choice was not uniformly at random but to represent four different 
classes: fpc [20| as a simple tuning of the basic iterative thresholding 
algorithm, FISTA (3) as a representative of the "optimal algorithms" 
in the sense of worst case complexity, GPSR [1TJ as a highly tuned 
basic gradient method and YALL1 (33) as a member of the class of 
alternating directions method^ All these solvers proceed iteratively 
and use (basically) one application of A and one of A T for each 
iteration. Hence, the runtime of these algorithms is mainly related to 
the number of iterations. We did not include higher order solvers like 
fss |21| or ssn [18| and also did not use any variant of homotopy 
approaches (24). 

For algorithms we overrode the implemented stopping criteria by 
the criterion that the relative error in the reconstruction 

_ \\Xn ~ X*\\ 
~ ||X*|| 

falls below a given threshold. 

2 However, one observes that the noise level \\Ax* — b\\ = A \\y\\ is pro- 
portional to A which, again, motivates that one should choose A proportional 
to the noise level. 

3 Sources: fpc version 2.0 http://www.caam.rice.edu/~optimization/Ll/fpc/ 
GPSR version 6.0 http://www.lx.it.pt/~mtf/GPSR7 YALL1 version 1.0 http: 
//yall Lblogs.rice.edu/ and an own implementation of FISTA. 




Fig. 2. Results for from Section [Ill-B | on the influence of the sparsity level 
s. 

A. Influence of the parameter A 

Here we consider a standard example from compressed sensing, 
namely a sensing matrix A which consists of random rows of a DCT 
matrix. The setup is as follows: 

Dimensions: 

• n = 1000 variables, 

• k — 200 measurements 
Matrix A: 

Random rows of a DCT matrix 
Solution x*: 

s = 20 non-zero entries, magnitude normally distributed 

with mean zero and variance one. 
A: 10 _ \ 10~ 2 , 10~ 4 
Results: 

In general, all solver slow down for smaller values of A. 
However, some solvers depend greatly on the size of A, see 
Figure [JJ 

B. Influence of the sparsity level 

While the construction of a test instance is independent of the 
parameter A, it gets harder for less sparsity. The behavior of the 
solvers with respect to the sparsity level is illustrated by this example: 

Dimensions: 

• n = 2000 variables, 

• k — 200 measurements 
Matrix A: 

Bernoulli ensemble, i.e. random ±1 
Solution x*: 

s = 4, 80 non-zero entries, respectively; magnitude nor- 
mally distributed with mean zero and variance one. 

A: 10" 1 

Results: 

Most solvers take longer for less sparsity; however, surpris- 
ingly, YALL1 is even faster for lower sparsity, see Figure [2] 

C. Influence of the dynamic range of the entries in x* 
As claimed in the introduction, the dynamic range 

_ maxpl : x* f_ 0} 
1 ' min{|a;1 : x* / 0} 
also influences the performance. 
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Fig. 1. Results for from Section [lII-A| on the influence of A. 



Dimensions: 

• n = 3000 variables, 

• k = 1000 measurements 
Matrix A: 

Union of three orthonormal basis: the identity matrix, the 
DCT matrix and an orthonormalized random matrix 
Solution x*\ 

s = 50 non-zero entries, with a dynamic range of approxi- 
mately 9, 701 and 55.000, respectively. 

A: 10" 1 

Results: 

Some solvers dramatically slow down for larger dynamic 
range, see Figure [5] 

D. Influence of the coherence of A 

To illustrate that also a large coherence can cause solvers to slow 
down, we have chosen the following setup: We considered square 
matrices A G R_ nx ™ which are zero expect on the diagonal and a 
certain number K of lower off-diagonals, scaled to have || A|| = 1: 

"1 01 



-4/. 







K columns 

We also considered the extreme case K = n, also known as the 
Heaviside matrix. Denoting the columns of Ak by a.j, we calculate 
the coherence of the matrix Ak as 



{di\ Oij) 
fl = max j. ry^ [j- 

i^j ai \\aj\\ 



K-l 
K ■ 



Dimensions: 

• n = 300 variables, 

• k = 300 measurements 
Matrix A: 

Increasingly coherent matrices with K = 5, 40, 100, 300 
Solution x*\ 

s — 30 non-zero entries, Bernoulli, i.e. randomly selected 
+ 1 and -1. 
A: 10" 1 



Results: 

This problem, while with an square and invertible matrix, 
is known the be notoriously hard. Especially for large K 
all solvers deteriorate, see Figure [4] 

Appendix A 
Algorithms 

Instead of y G R m we construct a vector w G R" such that 

w G rg A T n Sign(a;*) 

which can be reformulated as 

Wi = l, i G A+ 
Wi = — 1, i G A- 
\wi\ < 1, i £ I 

and w G rgA T . Then y can be found by solving A T y = w. 

A. Solution by projection onto convex sets 

The condition w G rgA T n Sign(x*) can be seen as a convex 
feasibility problem |1| since both the sets rgA T and Sign(a;*) are 
convex. Moreover, the projection onto each set is computationally 
feasible: The projection onto the range of A T can be calculated 
explicitly, e.g. with the help of QR factorization. If A T — QR 
with orthonormal Q and upper triangular R, the projection P rs A T 
is given by P Tg ^r = Q('-, 1 : k)Q(:,l : k) T . Projecting onto the 
convex set Sign(a;*) is even simpler: Set the fixed components to 
±1 respectively and clip the others by x i— > max(min(a;, l)x, — 1). 
We done the projection onto Sign(a;*) by Psign(^*)- 

Now we find w by alternatingly project an initial guess onto both 
sets, a strategy knows as projection onto convex sets (POCS) |8|, 
|19| . This is given as pseudo code in Algorithm [T] 

B. Solution by quadratic programming 

We sketch another approach by quadratic programming: We call 
A = A+ U A- the active set and I the inactive set and define 
s G R- 4 by 



Si = 1, i G A+ 
Si = — 1, i G A- 



(3) 



Furthermore we denote with P4 : R" — > TL A the projection which 
deletes the "inactive" components and with Pz : R n — > R 1 the 
projection which deletes in "active" components and the respective 
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Fig. 3. Results for from Section [lII-C| on the influence of the dynamic range. 
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Fig. 4. Results for from Section [lII-D| on the influence of the coherence. 



Algorithm 1 Calculation of y by POCS 

Require: Input A G R mx ™, a partition A+, A- and 1 of 
{1, . . . , n} (coded as Sign(a;*)), a tolerance e > and an initial 
guess wo- 
1: for i = 0, 1, . . . do 

2: D" = P rgi rt«" 

3: = P S ign(..)« n 

4: if max(||«™-w™]| , \\w n+1 -v n \\) < e then 

5: break 

6: end if 

7: end for 

8: Solve A T y = to 

9: return y 



adjoint Pj and which fill up the vectors be zeros. With this 
notation, we aim at finding w € rgA T such that 

P A w = s, and UPrw]!^ < 1. 

To fulfill the condition w G TgA T we use the orthogonal projection 
on rgA T , denoted by P rg A T an d require P IgA TW — w. Since w is 
determined on the active set A we rewrite is as 

w = Pis + P£z (4) 

with a z G R, , Putting this together we have to find a vector z G R 1 
such that 

(P rgA T -Id)Pj*= (Id -P rgA T)Pj S , II^L < 1. 



We the abbreviations 

P = (P rgA --Id)Px 
0= (Id -P rgA T)Pls 
we reformulate this as the optimization problem 

min 1 \\Pz - ull 2 s.t. II ^ II < 1. (6) 

This quadratic programming or constrained regression problem can 
be solved by various methods (5) including the simple gradient 
projection |15| or the conditional gradient method [2], |14|. Note 
that we require that the optimal value of {6| is indeed zero. 
Algorithm [2] gives pseudo-code for calculating y. 
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